AITopics | high-level representation

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

Neural Information Processing SystemsDec-25-2025, 13:07:35 GMT

The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical representations of speech by applying multiple levels of Contrastive Predictive Coding (CPC). We observe that simply stacking two CPC models does not yield significant improvements over single-level architectures. Inspired by the fact that speech is often described as a sequence of discrete units unevenly distributed in time, we propose a model in which the output of a low-level CPC module is non-uniformly downsampled to directly minimize the loss of a high-level CPC module. The latter is designed to also enforce a prior of separability and discreteness in its representations by enforcing dissimilarity of successive high-level representations through focused negative sampling, and by quantization of the prediction targets. Accounting for the structure of the speech signal improves upon single-level CPC features and enhances the disentanglement of the learned representations, as measured by downstream speech recognition tasks, while resulting in a meaningful segmentation of the signal that closely resembles phone boundaries.

acoustic unit discovery, representation, variable-rate hierarchical cpc, (5 more...)

Neural Information Processing Systems

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.07)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Speech (0.60)

Add feedback

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

Neural Information Processing SystemsAug-12-2025, 22:37:16 GMT

The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical representations of speech by applying multiple levels of Contrastive Predictive Coding (CPC). We observe that simply stacking two CPC models does not yield significant improvements over single-level architectures. Inspired by the fact that speech is often described as a sequence of discrete units unevenly distributed in time, we propose a model in which the output of a low-level CPC module is non-uniformly downsampled to directly minimize the loss of a high-level CPC module. The latter is designed to also enforce a prior of separability and discreteness in its representations by enforcing dissimilarity of successive high-level representations through focused negative sampling, and by quantization of the prediction targets.

artificial intelligence, machine learning, representation, (5 more...)

Neural Information Processing Systems

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.09)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

Neural Information Processing SystemsJan-19-2025, 03:27:16 GMT

The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical representations of speech by applying multiple levels of Contrastive Predictive Coding (CPC). We observe that simply stacking two CPC models does not yield significant improvements over single-level architectures. Inspired by the fact that speech is often described as a sequence of discrete units unevenly distributed in time, we propose a model in which the output of a low-level CPC module is non-uniformly downsampled to directly minimize the loss of a high-level CPC module. The latter is designed to also enforce a prior of separability and discreteness in its representations by enforcing dissimilarity of successive high-level representations through focused negative sampling, and by quantization of the prediction targets.

acoustic unit discovery, representation, variable-rate hierarchical cpc, (3 more...)

Neural Information Processing Systems

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.09)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

MSPred: Video Prediction at Multiple Spatio-Temporal Scales with Hierarchical Recurrent Networks

Villar-Corrales, Angel, Karapetyan, Ani, Boltres, Andreas, Behnke, Sven

arXiv.org Artificial IntelligenceNov-9-2022

Autonomous systems not only need to understand their current environment, but should also be able to predict future actions conditioned on past states, for instance based on captured camera frames. However, existing models mainly focus on forecasting future video frames for short time-horizons, hence being of limited use for long-term action planning. We propose Multi-Scale Hierarchical Prediction (MSPred), a novel video prediction model able to simultaneously forecast future possible outcomes of different levels of granularity at different spatio-temporal scales. By combining spatial and temporal downsampling, MSPred efficiently predicts abstract representations such as human poses or locations over long time horizons, while still maintaining a competitive performance for video frame prediction. In our experiments, we demonstrate that MSPred accurately predicts future video frames as well as high-level representations (e.g. keypoints or semantics) on bin-picking and action recognition datasets, while consistently outperforming popular approaches for future frame prediction. Furthermore, we ablate different modules and design choices in MSPred, experimentally validating that combining features of different spatial and temporal granularity leads to a superior performance. Code and models to reproduce our experiments can be found in https://github.com/AIS-Bonn/MSPred.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2203.09303

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Meta's Yann LeCun strives for human-level AI

#artificialintelligenceMar-21-2022, 16:20:23 GMT

Did you miss a session at the Data Summit? What is the next step toward bridging the gap between natural and artificial intelligence? Scientists and researchers are divided on the answer. Yann LeCun, Chief AI Scientist at Meta and the recipient of the 2018 Turing Award, is betting on self-supervised learning, machine learning models that can be trained without the need for human-labeled examples. LeCun has been thinking and talking about self-supervised and unsupervised learning for years.

learning, lecun, self-supervised learning, (17 more...)

#artificialintelligence

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.35)

Add feedback

Meta's Yann LeCun is betting on self-supervised learning to unlock human-compatible AI

#artificialintelligenceMar-15-2022, 01:50:20 GMT

This article is part of our coverage of the latest in AI research. What is the next step toward bridging the gap between natural and artificial intelligence? Scientists and researchers are divided on the answer. Yann LeCun, Chief AI Scientist at Meta and the recipient of the 2018 Turing Award, is betting on self-supervised learning, machine learning models that can be trained without the need for human-labeled examples. LeCun has been thinking and talking about self-supervised and unsupervised learning for years. But as his research and the fields of AI and neuroscience have progressed, his vision has converged around several promising concepts and trends.

intelligence, lecun, self-supervised learning, (16 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.64)

Add feedback

Meta's Yann LeCun on his vision for human-level AI

#artificialintelligenceMar-8-2022, 12:15:46 GMT

This article is part of our coverage of the latest in AI research. What is the next step toward bridging the gap between natural and artificial intelligence? Scientists and researchers are divided on the answer. Yann LeCun, Chief AI Scientist at Meta and the recipient of the 2018 Turing Award, is betting on self-supervised learning, machine learning models that can be trained without the need for human-labeled examples. LeCun has been thinking and talking about self-supervised and unsupervised learning for years. But as his research and the fields of AI and neuroscience have progressed, his vision has converged around several promising concepts and trends.

learning, lecun, self-supervised learning, (17 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)

Add feedback

Generalizing Graph Neural Networks on Out-Of-Distribution Graphs

Fan, Shaohua, Wang, Xiao, Shi, Chuan, Cui, Peng, Wang, Bai

arXiv.org Artificial IntelligenceNov-22-2021

Graph Neural Networks (GNNs) are proposed without considering the agnostic distribution shifts between training and testing graphs, inducing the degeneration of the generalization ability of GNNs on Out-Of-Distribution (OOD) settings. The fundamental reason for such degeneration is that most GNNs are developed based on the I.I.D hypothesis. In such a setting, GNNs tend to exploit subtle statistical correlations existing in the training set for predictions, even though it is a spurious correlation. However, such spurious correlations may change in testing environments, leading to the failure of GNNs. Therefore, eliminating the impact of spurious correlations is crucial for stable GNNs. To this end, we propose a general causal representation framework, called StableGNN. The main idea is to extract high-level representations from graph data first and resort to the distinguishing ability of causal inference to help the model get rid of spurious correlations. Particularly, we exploit a graph pooling layer to extract subgraph-based representations as high-level representations. Furthermore, we propose a causal variable distinguishing regularizer to correct the biased training distribution. Hence, GNNs would concentrate more on the stable correlations. Extensive experiments on both synthetic and real-world OOD graph datasets well verify the effectiveness, flexibility and interpretability of the proposed framework.

graph, high-level representation, representation, (17 more...)

arXiv.org Artificial Intelligence

2111.10657

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Understanding Human Judgments of Causality

Kazama, Masahiro, Suhara, Yoshihiko, Bogomolov, Andrey, Pentland, Alex `Sandy'

arXiv.org Artificial IntelligenceDec-18-2019

Discriminating between causality and correlation is a major problem in machine learning, and theoretical tools for determining causality are still being developed. However, people commonly make causality judgments and are often correct, even in unfamiliar domains. What are humans doing to make these judgments? This paper examines differences in human experts' and non-experts' ability to attribute causality by comparing their performances to those of machine-learning algorithms. We collected human judgments by using Amazon Mechanical Turk (MTurk) and then divided the human subjects into two groups: experts and non-experts. We also prepared expert and non-expert machine algorithms based on different training of convolutional neural network (CNN) models. The results showed that human experts' judgments were similar to those made by an "expert" CNN model trained on a large number of examples from the target domain. The human non-experts' judgments resembled the prediction outputs of the CNN model that was trained on only the small number of examples used during the MTurk instruction. We also analyzed the differences between the expert and non-expert machine algorithms based on their neural representations to evaluate the performances, providing insight into the human experts' and non-experts' cognitive abilities.

cause-effect attribution task, cognitive ability, representation, (15 more...)

arXiv.org Artificial Intelligence

1912.08998

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees

Renard, Xavier, Woloszko, Nicolas, Aigrain, Jonathan, Detyniecki, Marcin

arXiv.org Machine LearningJun-4-2019

Interpretable surrogates of black-box predictors trained on high-dimensional tabular datasets can struggle to generate comprehensible explanations in the presence of correlated variables. We propose a model-agnostic interpretable surrogate that provides global and local explanations of black-box classifiers to address this issue. We introduce the idea of concepts as intuitive groupings of variables that are either defined by a domain expert or automatically discovered using correlation coefficients. Concepts are embedded in a surrogate decision tree to enhance its comprehensibility. First experiments on FRED-MD, a macroeconomic database with 134 variables, show improvement in human-interpretability while accuracy and fidelity of the surrogate model are preserved.

artificial intelligence, decision tree learning, machine learning, (16 more...)

arXiv.org Machine Learning

1906.01297

Country: